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Abstract 

In this article we define new Frechet features for random cumulative 
distribution functions using contrast. These contrasts allow to construct 
Wasserstein costs and our new features minimize the average costs as 
the Frechet mean minimizes the mean square Wasserstehi2 distance. An 
example of new features is the median, and more generally the quantiles. 
From these definitions, we are able to define sensitivity indices when the 
random distribution is the output of a stochastic code. Associated to 
the Frechet mean we extend the Sobol indices, and in general the indices 
associated to a contrast that we previously proposed. 
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Introduction 

Nowadays the output of many computer codes is not only a real multidimen¬ 
sional variable but frequently a function computed on so many points that it 
can be considered as a functional output. In particular this function may be the 
density or the cumulative distribution function ( c.d.f ) of a real random vari¬ 
able (phenomenon). In this article we focused on the case of a c.d.f output. 
To analyze such outputs one needs to choose a distance to compare various 
c.d.f.. Among the large possibilities offered by the literature we have chosen 
the Wasserstein distances (for more details on wasserstein distances we refer 
to [?]). Actually for one dimensional probability distributions the Wasserstein p 
distance simply is the L p distance of simulated random variables from a uni¬ 
versal (uniform on [0,1]) simulator U : W P (F, G) = \F~(u) — G~{u)\ v du = 

E| F~ (U) — G~ (U ) | p , where F~ is the generalized inverse of F. This means that 
using Wasserstein distances is to compare various c.d.f from various codes on a 
same simulation space, which seems very natural in many situations. The most 
relevant cases seem to be p = 2 and p = 1, and in this paper we will work with. 
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In this article, we consider the problem of defining a generalized notion of 
barycenter of random probability measures on R. It is a well known fact that 
the set of Radon probability measures endowed with the 2-Wasserstein distance 
is not an Euclidean space. Consequently, to define a notion of barycenter for 
random probability measures, it is natural to use the notion of Frechet mean 
[?] that is an extension of the usual Euclidean barycenter to non-linear spaces 
endowed with non-Euclidean metrics. If Y denotes a random variable with 
distribution P taking its value in a metric space (A4. dj^), then a Frechet mean 
(not necessarily unique) of the distribution P is a point m* £ A4 that is a global 
minimum (if any) of the functional 


J ( TO ) = i/ d 2 M (m,y)dP(y) 
1 JM 


in* £ arg min J(m). 


In this paper, a Frechet mean of a random variable Y with distribution P will 
be also called a barycenter. For random variables belonging to nonlinear metric 
spaces, a well-known example is the computation of the mean of a set of planar 
shapes in the Kendall’s shape space [?] that leads to the Procrustean means 
studied in [?]. Many properties of the Frechet mean in finite dimensional Rie- 
mannian manifolds (such as consistency and uniqueness) have been investigated 
in [?, ?, ?, ?, ?, ?]. 


This article is an attempt to use these tools and some extensions for ana¬ 
lyzing computer codes outputs in a random environment, what is the subject 
of computer code experiments. In the first section we define new contrasts for 
random c.d.f. by considering generalized " Wasserstein" costs. From this, in the 
second section we define new features in the way of the Frechet mean that we 
call Frechet features. Then we propose some examples. The next two sections 
are devoted to a sensitivity analysis of random c.d.f., first from a Sobol point 
of view that we generalized to a contrast point of view as in [?]. 


1 Wasserstein distances and Wasserstein costs for 
unidimensional distributions 

For any p > 1 we may define a Wasserstein distance between two distribution 
of probability, denoted F and G (their cumulative distribution functions, c.d.f.) 
on R d by: 


W?(F,G)= mmEpf-Yf, 

where the random variables (r.v.’s) have c.d.f. F and G (X ~ F, Y ~ G), 
assuming that X and Y have finite moments of order p. We call Wassertein p 
space the space of all c.d.f. of r.v.’s with finite moments of order p. 

As previously mentioned, in the unidimensional case where d = 1, it is well 
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known that W P (F, G) is explicitly computed by: 

W P (F,G) = [ \F~(u) - G~(u)\ p du = E\F~(U) - G~(U)\ P . 

Jo 

Here F~ and G~ are the generalized inverses of F and G that are increasing 
with limits 0 and 1, and U is a r.v. uniform on [0,1]. Of course F~(U ) and 
G~(U) have c.d.f. F and G. 

This result extends to more general contrast functions. 

Definition 1.1 We call contrast functions any application c from R 2 to R sat¬ 
isfying the "measure property " V defined by 

V : Vx < x' and \/y < y ', c{x ', y') — c(x\ y) — c(x, y') + c(x, y) < 0, 

meaning that c defines a negative measure on R 2 . 

Example 1.1 c(x,y ) = —xy satisfies the V property. 

Remark 1 If c satisfies V then any function of the form a{x) + b(y) + c(x, y) 
satisfies V. For instance (x — y) 2 = x 2 + y 2 — 2xy satisfies V . 

Remark 2 More generally if C is a convex real function then c(x,y) C(x-y) 
satisfies V. This is the case of \x — y\ p , p > 1. 

Definition 1.2 We define de Skorohod space T> := £>([0,1]) of all distribution 
functions that is the space of all non decreasing function from R to [0,1] that 
are cad-lag with limit 0 (resp. 1) in —oo (resp. +oo^ equiped with the supremum 
norm. 

Definition 1.3 (The c~ Wasserstein cost) For any F € V, any G £ V and 

any positive contrast function c, we define the c— Wasserstein cost by 

W C (F,G) = min E(c(X, T)) < +oo 
(X~F,Y~G) 

The following theorem can be found in ([?]). 

Theorem 1.2 (Cambanis, Simon, Stout [?]) Let c a function from R 2 tak¬ 
ing values in R. Assume that it satisfies the "measure property" V. Then 

W C (F, G) = [ c(F-(u),G-{u))du = E c{F~(U), G ~([/)), 

Jo 

where U is a random variable uniformly distributed on [0,1]. 

At this point we may notice that in a statistical framework one encounter 
many contrasts that are defined via a convex function. Actually many features 
of probability distribution can be characterized via such a contrast function. 
For instance an interesting case is the quantiles. Applying the previous remark 
we get: 
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Proposition 1.1 For any a £ (0,1) the contrast function (pinball function) 
associated to the a-quantile c a (x, y) = (1 — a)(y — x)l x - y< o + a(x — y)l x - y >o 
satisfies V. 


This result is the starting point of the definition of some new features of 
random c.d.f.. 

2 Extension of the Frechet mean to other fea¬ 
tures 

A Frechet mean EX of a r.v. X taking values in a metric space (XI, d) is define 
as (whenever it exists): 


EX £ argmin egA1 E d(X, 0) 2 . 

That means that it minimizes the contrast E d(X, 6) 2 which is an extension 
of the classical contrast E||X — 9\\ 2 in 

Adopting this point of view we can define a "Frechet feature" associated to 
a convenient contrast function. 

Now we consider a probability space (Cl, A , P) and a measurable application 
F from El to V. Take c a positive contrast (satisfying property V) and define 
the analogously to the Frechet mean, the Frechet feature associated to c or 
contrasted by c as it follows: 


Definition 2.1 Assume that F is a random variable taking values in T>. Let 
c be a non negative contrast function satisfying the property V. We define a 
c-contrasted feature £ c F of F by: 

E c F £ argmin GeT) ¥, (W C (F, G )). 

Of course this definition coincides with the Frechet mean in the Wasserstein 2 
space when using the "contrast function" c(F,G ) = W 2 (F,G). 

Theorem 2.1 If c is a positive cost function satisfying the property V, if the 
application defined on (u>,u) £ El x (0,1) by F - (w,it) is measurable and if £ c F 
exists and is unique we have: 

(£ c F) - (u) = argmin seS Ec(¥~ (u), s). 

That is £ c ¥ is the inverse of the function taking value at u the c-contrasted 
feature of the real r.v. F - (u). For instance the Frechet mean in the Wasserstein 2 
space is the inverse of the function u —> E (F“(it)). 
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Remark 3 Here, we proposed a general framework on F and made some strong 
assumptions on existence uniqueness and measurability. But one can construct 
explicit parametric models for F. We refer to /?/ for such example. In particular 
in /?/, the authors used some results of /?/ that ensures measurability for some 
parametric models on F. 

Another example is the Frechet median. A contrast function defining the median 
in K is \x — y\. An immediate extension to the Wasserteini space is to consider 
the "contrast function" c{F, G) = W\{F, G ). Thus we obtain the Frechet median 
of a random c.d.f. as : 


(Med(F))-(u) e Med(F"(u)). 

More generally we can define an a-quantile of a random c.d.f., g a (F), as: 
(q a (¥))~(u) e q a (¥~(u)), 

where q a (X) is the set of the a-quantiles of X taking its values in K. 
Proof of Theorem 12. II 
Since c satisfies V we have: 

EW c {¥,G)=e[ c(¥-(u),G-(u))du= f E c(¥~(u),G~(u))du, 

J o Jo 

by Fubini’s theorem. 

Now for all u £ (0,1) the quantity E c(F _ (it), G~(u)) is minimum for G~(u ) 
a feature contrasted by c. Noticing that this results in an increasing and cad-lag 
function the theorem follows. □ 


3 Example 

In this section we illustrate our definitions through an example. 

Let Fq an increasing absolutely continuous c.d.f (hence F 0 _1 exists), X a r.v. 
with distribution Fo, M and E two real r.vJs, E>0. We consider the random 
c.d.f. F of EX + M. We have: 

-y* j\/T 

F(x) = Fq( ———) and F~ 1 (m) = EFq- 1 ^) + M. 

2.J 

As well known the Frechet mean of F is given by: (£(F)) -1 (u) = EF 0 ^ 1 (u) + 
x — EAf 

M, thus £(F)(x) = F 0 (——). 

Now using the a-quantile contrast c a (x, y) = (1 — a)(y — x)l x - y< o + a(x — 
y)l x -y>o and following our definition, we define the "a-quantile" of F: 
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(<Za(F)) - 1 (u) = q a (ZF 0 ~\u) + M). 

Assuming that E = 1 it simplifies in <jr a (F)(a:) = Fq(x — q a (M)). When 

M = 0 we have g a (F)(a;) = F 0 ( — %=r) (see figure(??)). 

9a(E) 

Once these features defined, referring to computer experiment framework, 
in the next section we propose a sensitivity analysis of these Frechet features of 
a random c.d.f. as stochastic output of a computer code. 


4 Sensitivity indices for a random c. d.f. 

4.1 Sobol index 


A very classical problem in the study of computer code experiments (see [?]) is 
the evaluation of the relative influence of the input variables on some numerical 
result obtained by a computer code. This study is usually called sensitivity 
analysis in this paradigm and has been widely assessed (see for example [?], [?], 
[?] and references therein). More precisely, the numerical result of interest Y is 
seen as a function of the vector of the distributed input (Xj)j= l, -,d (d € N*). 
Statistically speaking, we are dealing here with the unnoisy non parametric 
model 

Y = f(Xi, ..., X d ), (1) 

where / is a regular unknown numerical function on the state space E\ x E 2 x 
... x Ed on which the distributed variables (X 1; ..., X^) are living. Generally, 
the inputs are assumed to be stochastically independent and sensitivity analysis 
is performed by using the so-called Hoeffding decomposition (see [?] and [?]). 
In this functional decomposition / is expanded as a L 2 sum of uncorrelated 
functions involving only a part of the random inputs. For any subset v of 
Id = {1 ,... ,d} this leads to an index called the Sobol index ([?]) that measures 
the amount of randomness of Y carried in the subset of input variables (Xj)j 6 „. 
Without loss of generality, let us consider the case where v reduces to a singleton. 
Let us first recall some well known facts about Sobol index. The global Sobol 
index quantifies the influence of the r.v. X t on the output Y. This index is based 
on the variance (see [?],[?]): more precisely, it compares the total variance of Y 
to the expected variance of the variable Y conditioned by Xj, 

_ Var(E[y|Xj]) 

1 Var(F) ' { ’ 

By the property of the conditional expectation it writes also 


Var(F) - E(Var[V|A,;]) 
Var(F) 


(3) 


In view of this formula we can define a Sobol index for the Frechet mean of a 
random c.d.f. F = h(X i,..., Xf). Actually we define Var(F) = EWf (F, £ (F)), 
and 
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Si(F) = 


From Theorem 12. II we get: 


Var(F) - E(Var[F|A'j]) 
VarF 


Var(F) = E f \¥~(u)-£(¥)-(u)\ 2 du = E [ \¥~ (u)-E¥~ (u)\ 2 du = [ Var(F ~(u))du. 

Jo Jo Jo 

And the Sobol index is now: 


O ™ fo 1 Var(F ~(u))du - f 1 EVarp-^l^du Var(E[F“( M )|W])du 

(-If J — -I — i 

f 0 Var(F ~(u))du f g Var(F ~{u))du 

x — A/I 

As a toy example, applying this to our previous example F(a;) = Fo(———), 

2.J 

where M and E play the role of influent random variables, we find: 


Var E + 2cov(E, M)E£ Var M + 2cov(E, M)E£ 

E _ Var E + Var M + 2cov(E, M)E£ ’ M " Var E + Var M + 2cov(E, M) E^ 

where £ has c.d.f. Fo, since E£ = f g F g 1 (u)du. 

In practice M and E depends upon numerous random variables (Xi,..., X,j), 
then the Sobol index with respect to Xi becomes: 

Var E[E|X, ; ] + 2cov(E[E|X, ; ], E[M|X ? ;])E£ + Var E[M|Xi] 
i " Var E + Var M + 2cov(E, M)E£ 

4.2 Sensitivity index associated to a contrast function 

The formula ([3j can be extended to more general contrast functions. The con¬ 
trast function naturally associated to the mean of a real r.v. is c(x, y ) = \x — y\ 2 . 
We have EV = argmin 0eR Ec(y, 6) and Var(V) = ming g REc(V, 9). Thus the 
denominator of Si is the variation between the minimum value of the contrast 
and the expectation of the minimum of the same contrast when conditioning by 
the r.v. Xi. Hence for a feature of a real r.v. associated to a contrast function 
c we defined a sensitivity index (see ([?])): 

_ minggR Ec(Y, 9) - Emineg R E[c(V, 0)\Xj\ 
min^gR Ec(Y, 6) 

Along the same line, we now define a sensitivity index for a c-contrasted 
feature of a random c.d.f. by: 

_ mincgw EWc(F, G) - EminGewE[W c (F, G)|X;] 
i,c “ min G g W ElV c (F, G) ' 
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The computation of S^ c simplifies when c satisfies the property V and as¬ 
suming the uniqueness of £ c ¥: 


_ E/q 1 c(F ~(u), (£ c ¥)~(u))du - Et/p 1 c(F~(u), (g c [F| Xi})-(u))du} 
Efic(F-{u),(£ c ¥)-(u)) 

where £ C [F|X,:] is the c-contrasted feature conditional to X, (i.e. with respect 
to the conditional distribution of F), also assumed to be unique. 

For instance if c = \x — y\, (£’ c F) _ (u) is the "median" (assumed to be unique) 
of the random variable F _ (u) and: 


_ E/q 1 |F"(u) - Med(¥~(u))\du - E^ 1 |F"(u) - Med[¥-(u)\Xi]\du} 
Ef 0 |F-(u) - Med(¥~(u))\du 

The same holds for any a-quantile, using the corresponding contrast function 
c a but whith less readable formula. 


5 Conclusion 

This article is an attempt to define interesting features for a functional output 
of a computer experiment, namely a random c.d.f., together with its sensitivity 
analysis. This theory is based on contrast functions that allow to compute 
Wasserstein costs. In the same way as the Frechet mean for the Wassersstein 2 
distance we have defined features that minimize some contrasts made of these 
Wasserstein costs. Straightforwardly from the construction of that features we 
have developed a proposition of sensitivity analysis, first of Sobol type and 
then extended to sensitivity indices associated to our new contrasts. We intend 
to apply our methodology to an industrial problem: the PoD (Probability of 
Detection of a defect) in a random environment. In particular we hope that our 
a-quantiles will provide a relevantt tool to analyze that type of data. 



